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Abstract: Ontology Aligning is an answer to the problem of handling heterogenous information on different domains. 

After application of some measures, one reaches a set of similarity values. The final goal is to extract map- 
pings. Our contribution is to introduce a new genetic algorithm (GA) based extraction method. The GA, 
employs a structured based weighting model, named "coincidence based model", as its fitness function. In the 
first part of the paper, some preliminaries and notations are given and then we introduce the coincidence based 
weighting. In the second part the paper discusses the details of the devised GA with the evaluation results for 
a sample dataset. 



1 INTRODUCTION 

Semantic web is made up of distributed information 
that has resulted in designing ontologies to lessen the 
heterogeneity. Yet another problem exists: ontologies 
themselves may cause heterogeneity. This is when 
two ontologies are trying to express same knowledge 
or concepts but they use different languages or words 
(Euzenat and Valtchev, 2004). This has led to the 
problem of ontology aligning. 

Ontology aligning has been discussed for a while, to 
enable a mapping of two heterogenous information in 
different domains. This will make agents which are 
using different ontologies, establish interpretability. 
This alignment will give a correspondence between 
concepts and semantics of two ontologies. 
To do an alignment it is customary to first apply some 
measures (simple or complex) to reach to some initial 
guesses. Then the problem is how to form an ideal 
mapping. This problem which is referred to as Map- 
ping Extraction is the target of this paper. 
After having an explanation of related works in sec- 
tion 2, section three shows some definitions and nota- 
tions used in the paper. Then section 4 discusses our 
coincidence based theory which is the basis for our 
GA algorithm discussed in section 5. Section 6 ex- 



plains evaluations done on the algorithm and finally 
section seven concludes the paper. 



2 RELATED WORKS 

Unfortunately works on ontology extractions are not 
so many, as stated in (Bouquet et al., 2004). In 
(Melnik et al., 2002), to extract a reasonable extrac- 
tion, Stable Marriage (Gibbons, 1985) problem is dis- 
cussed. There are some other approaches, e.g. a ma- 
chine learning approach to the problem is discussed in 
(Doan et al, 2003), and (Mitra et al., 2003) describe 
a probabilistic based model. Some methods tend to a 
trade off of different features such as efficiency and 
quality, as in QOM (Ehrig and Staab, 2003) and some 
have used approaches to integrate various similarity 
methods (Ehrig and Sure, 2004). 
(Kalfoglou and Schorlemmer, 2003) have come with 
a comprehensive review and presentations on the 
methods and approaches and the state of art in on- 
tology aligning . 

According to (Haeri et al, 2006): "no [much] work 
is so far done on the problem of Ontology Align- 
ment or Ontology Matching in which graph theoretic 
backbone of problem is scrutinized.". With the use 



of graph theory and such a modeling we believe that 
there is a vast area for new work on the problem of 
ontology aligning. 



3 DEFINITIONS 

In this section we will define the notations that are 
used throughout the paper. We will start by defining 
necessary mathematical backgrounds. 
A graph G,-, by definition, consists of two sets: 
V (G,) , E (Gi) , where V(G,-) is the set of vertices, and 
E(Gi) is the set of edges. The size of a graph is 
|V(G;)|, which is denoted by \G\. Lets assume that la- 
bels assigned to nodes are chosen from a finite alpha- 
bet E. Let X ^ E be a null character, and Ex = E U X. 

3.1 Typed Graph 

An ontology O,, in this paper, is considered as a typed 
graph Gi. A typed graph, as defined in (Haeri et al., 
2006), is denoted by Gi(V,E, T), where E is of type: 
E : V x V — > T. labels in T are from E^. In such 
a graph an edge e between two vertices v, . , vi k with 
type t is denoted by: e(y\ . , Vj k ) : t . 
In this paper each ontology 0, is modeled using a 
typed graph G, where concepts in 0, are nodes of G, 
and relations and properties of 0; are typed edges of 
the graph. 

3.2 Distance 

The distance of two concepts belonging to two 
different graphs is described as the distance of 
their labels in a metric space. Usually this met- 
ric distance is described by a distance function, 
8 : (Ex, x Ex)\(X,X) — > R. So the distance of two 
nodes v-,,Vj is denoted by S(label(vj),label(vj)). For 
simplicity we will show it by 8(v,-,v ; -). This metric 
distance, 8, for any vi,V2,V3 G Ex should have the 
following properties (Haeri etal, 2006): 

1. 8(vi,v 2 )>0,8(vi,vi)=0 

2. 8(vi,v 2 ) = 8(v 2 ,vi) Symmetry 

3. 8(vi ,v{) + 8(v2,V3) > 8(vi,V3) Transitivity 

This distance function can either be a String-Based 
distance or any other possible one. In this research, 
the distance of two vertices Vj,Vj, say 8(v;,v ; ), is con- 
sidered to be the Levenstein Distance (Levenshtein, 
1966) of their labels. 
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Figure 1: A sample matching of two graphs G',G 



3.3 Ontology Alignment 

In this section we will discuss our own understanding 

of a one to one matching between ontologies. 

A one to one matching 1 of two ontologies O l{ ,Oi 2 

is denoted by m : 0;, — > 0, 2 and is a one to one 

correspondence between nodes of the two graphs of 

O h ,O h :{G h ,G h ). 

(Ehrig and Sure, 2004) define the mapping function 

the following way: 

• m : Oi, — > 0-,~ 



• Vv G G;, : m(v) = v'ifv' G G, 2 and8(v,v') < 
t, for t being a threshold 

v' is the corresponding node of v under the mapping 

m. It is clear that with this definition, correspondence 

of edges, is determined by the correspondence of 

nodes: 

Ve ( v i;> v iJ : * G E ( G h) : if m(vi j ) = v 2j ^ 

0. m (vi t ) = v 2k ^ 0,ande(v 2; .,v 2 J : t G 

E(Gi 2 ) then m(e(vi j .,vij) = e{v 2j ,v 2k ) 

A sample alignment for two sample ontologies G' ,G 

is shown in Figure 1. 



3.4 Edge Preservation 

We will call an edge e(v\ -,vi k ) : t G £(G;, ) is pre- 
served under the matching m, iff there is an edge 
e(m(y\ .),m(v\ k )) : t G E(G n ). In other words an 
edge, e is preserved under matching m if and only if 
3e' : e' = (m(y\ .),m(vi k )),m(e) = e' ,and is not pre- 
served otherwise 

The preservation of edges between corresponding 
nodes is the key point to find an ideal matching. In 
fact in an ideal alignment most of the edges of one 
ontology are preserved in the second one. 



1 We will sometimes call it Alignment or mapping of two 
ontologies 



4 COINCIDENCE BASED 
WEIGHTING 

In this section we introduce and discuss a new weight- 
ing model for an alignment, with which we will later 
design our genetic algorithm. The coincidence based 
alignment weight function is sufficiently discussed in 
(Haeri et al., 2006), and here, we will have a brief in- 
troduction to it. Before talking about the weight itself, 
lets take some time, and discuss the matter. 
Consider a mapping m, between two ontologies with 
graphs Gi { ,Gi 2 , and also consider two nodes vi .,v\ k G 
V(G; { ) and their matches m{v\ .),m(v\ k ). The weight- 
ing system should result a high weight if vi . is 
close to m{v\ .) and also is v\ k to m{v\ k ) and be- 
sides, e = (vi ,v\ k ) : t G E(G l{ ) is preserved under 
m. In this case v\ ,v\ t are close to m(y\ .),m(v\ k ), and 
there is an edge both between vi -,v\ k , and between 
m(v\ .),m(v\ k ). This case is considered to be the most 
desired one and should be given the highest value. 
In the second case lets suppose the edge is not pre- 
served. Here, a negligible negative point should be 
given. The reason for negative point is the fact that, 
the edge is not preserved and the structural matching 
of the graphs is interrupted. In this case the nodes are 
very close but the edge is missing. 
The farther any of the nodes is, from its match, the 
lower should be the positive value of the matching. 
If the edge is preserved, we give this matching a low 
positive value. But when the edge is not preserved, in 
fact it is an undesired matching. So we give it a nega- 
tive point. In this case not only the nodes are far from 
their matches, but also the edge is not preserved. 
According to above considerations there should be six 
different categories: (Suppose G, G' are graphs of two 
ontologies 0, 0' to be aligned, a, b are concepts from 
G, anda',£'fromG') 

• Category I. a and b are too close 2 , and b, b' are 
close as well. The edge between a,b is preserved 
so this category is of much importance. This is 
because actually the two edges coincide too much. 

• Category II. In this category the edge is pre- 
served but only one of the a or b is close to its 
match. This is good but not as much as the previ- 
ous category. 

• Category III. The two peers of an edge are 
close to their matches, that means, a is close to a' 
and b is close to b' . But the edge is not preserved. 
This category should not be penalized much, be- 
cause at least concepts are close to their matches 
and vertices coincide. 



in terms of a distance function described before 



• Category IV. The edge between a, b is not pre- 
served, and b if far from b' . The only positive 
point of such a matching is the fact that a and a' 
are close. 

• Category V. and VI. in these categories, both 
a, a' and b,b ! are far from one another, and the 
difference is in the preservation of edges. Both 
cases are not desired and should obtain low points. 

According to the above sort and discussions, the 
following weigh function is suggested: 



w{m) = wo (m) — wi{m) — w r {m) 



wo(m)= Y /(vi)+/(v 2 ) 

(v uV2 ):KE(G) , (m(v 1 ),m(v 2 )):(€£(G') 



m(m)= Y g(vi)+g(v 2 ) 

(y 1) v 2 ):r££(G) , ( m (v 1 ), m (v 2 )):^£(G') 



w r (m)= Y g(vi)+g(v 2 ) 

(vi,v 2 ):^£(G) , (m(vi),m( V2 )):teE(G r ) 



The functions / and g, referred to as Normaliza- 
tion Functions, are in the form: 

f:R^R+ 

g:R^R+ 

f,g are related to the distance function. In fact, 
/ should be a positive decreasing function, so that 
if 8(v,m(v)) grows, it decreases to reduce the pos- 
itive point. And on the other hand g should be a 
positive increasing function to grow with the growth 
of 8(v,m(v)) to increase the negative point for that 
match. Normalization functions are defined by tun- 
ing the system. This will be described again later. 



5 GENETIC ALGORITHM 

This section describes the designed genetic algo- 
rithm. Matching two general graphs in polynomial 
running time algorithms is impossible, because the 
problem in its general case is MAX SNP-Hard (Arora 
et al., 1992). So a random search algorithm could be 
a good idea when designed carefully. This led us to 
the idea of using genetic algorithm. 



5.1 Coding a Matching 



5.1.2 An Example 



To code a matching we used hashmaps (Cormen 
et al, 2001) in which keys are concepts of one 
ontology and entries are concepts of another. Entry 
for each key is actually the corresponding node of 
that key in the mapping 



5.1.1 Pairs 

According to the coincidence-based model, we de- 
fined Pair, as two concepts from one ontology, 
between which there is a relation (So there is 
an edge in the graph of that ontology). Fig- 
ure 1 shows the alignment of two ontologies. 
(vi,V2),(vi,V3),(v3,V4),(v2,v 4 ) are pairs of G. Apair 
also has a weight according to the alignment it in- 
volves in. 
Clearly speaking, a pair is a function of the form: 



P:V xVxT 



R 



where V is the set of vertices in ontology graph G and 
T is a set of labels in E^. So an ontology in a match- 
ing has a limited number of pairs. 
The weight of a pair depends on the alignment in 
which the ontology is involved. Let G,-, , G; 2 be two 
graphs of two aligned ontology, and vi .,v\ k G V(G;, ). 
Also assume an edge between vi .,v\ k to be of type t, 
e i jt =e(v lp v h ):t. 

P(v\ ,v\ k ,t) in the alignment of two ontologies is 
given by: 



P( Vlj ,v h ,t) = 



/(vij,vi t ) 



if e\ ;i is preserved 
; not pp 
E{G h ) 






if e\ ik is not preserved 



ife 



For the couple of concepts which do not form a pair 
the value of P function is set to be — °°. Definition 
of pairs, seemed necessary for further crossover 
function which will try to improve the structural 
matching. 

In the alignment of two ontologies, (^(G,,) , 
0,- 2 (G; 2 ), say m : 0;, — ► O n , we also define the 
weight of a single concept from one ontology, 
W(vij) where v\j G V(G,, ), as follows: 



ifvi ; GV(G ;i ),™(vi,)GV(G, 2 ) 



W( Vlj ) = 



P(vi:,V,t) 



To make things clear about the definition of pair and 
its corresponding weights described above, we give 
an example on how to compute these weights. 
In the Figure 1 we have: 

P(vi,V2,t2)=f(v l )+f(v 2 ) 
J , (vi,V 3 ,fl)=/(vi)+/(v 3 ) 
P(v 3 ,V 4 ,f 3 ) = -g(v 3 ) ~ g(v 4 ) 
P(v 2 ,V 4 ,f 4 ) = -g(v2)-g(v4) 
P(vi,V 4 ,f,) = P(v 2 ,V3,ti) = -oo 

WM = (/(v,) +/(v 2 )) + (/(vi) +/(v 3 )) 
W(v 2 ) = (/(v 2 ) +/(vi)) - (g(v 2 ) +g(v 4 )) 
W(v 3 ) = (/(v 3 ) +/(vi)) - (g(v 3 ) +g(v 4 )) 
W(v 4 ) = -U(v 4 ) +g(v 3 )) - (g(v 4 ) +g(v 2 )) 
Now, with these definitions, it is the time, to clarify 
the steps of the genetic algorithm. 

5.2 Initialization 



VveG ;i :e(v, ,,v):re£(G ;i ) 



The start population, is initialized randomly, with 
an initial population of 1000 individuals. The ideal 
matching can be reached more quickly if the initial 
individuals, are made on the basis of the labels of con- 
cepts, that is if vi . in G- tl and vi in G; 2 have same la- 
bels, then let v> . corresponds to vi . in the initial map- 
ping. 

5.3 Selection 

In each iteration, we sorted the 1000 individuals ac- 
cording to their fitness described in section 4 (coinci- 
dence based weight function, and we selected the 500 
best individuals as parents of next step. From these 
500 individuals, with the use of crossover and muta- 
tion functions (as we will see later), 1000 new indi- 
viduals are created. These 1000 individuals are sent 
to the next iteration. 

5.4 Crossover 

In the crossover function, single nodes are compared 
according to their weight. As we described before the 
weight of a single node in an alignment is the sum of 
weights of pairs in which, that node is included. 
Consider two parents between two ontology graphs 
G,-, , G,- 2 . To make an offspring from two parents, for 
every node in G,-, , say v\ . , the mapping with larger 
W{v\ .) is copied to the offspring, if m{v\ .) in G,, 
is already assigned with some other node of G,-, , 
then vi is put in a reserved list. At the end of the 
complete iteration of nodes in G,-, , the nodes in that 
reserved list are randomly mapped to the unassigned 
nodes of G,-, . The random assignment is not done in 




<a> 



2 (Figure 2 (b)). So as is seen, the corresponding node 
of v, G V(G) in offspring is chosen by the mapping 
shown from parent 1, and therefore is v\ G V(G') 
This kind of crossover seems reasonable because the 
mapping of a single node in the offspring is not worst 
than that of the two parents. So by this assumption, 
little by little, mappings of nodes will converge to an 
ideal ones. 




fl» 



o •— — O 



5.5 Mutation 

with some probability, different in various situations, 
a proportion of the population are involved in muta- 
tion. In mutation of a mapping of two ontologies with 
graphs G,-, , G; 2 , two random nodes from G,-, are cho- 
sen, and their matches in G,\ are substituted with each 
other. 

Let vi.,vi t G V(Gi l ) are chosen randomly, also let 
»»(vi .) = v 2j G V(G h ),m(v h ) = v 2k G V{G h ). In the 
mutation process we just substitute the match nodes 
of the selected ones. So the new mapping will be 
m{v\ j ) =v 2t G V(G;,),m(viJ = v 2j eV(G; 2 ). 

5.6 Continuation 



Figure 2: crossover . (a)part of parent 1 matching, (b)part 
of parent 2 matching (c) part of offspring 



the middle of the iteration, to prevent nodes of G; 2 to 
be assigned to some random nodes, where they can 
be assigned to better nodes later in the iteration. So 
this random assignment is postponed until all nodes 
of G,-, are examined to map to nodes in G,, . 
As an example suppose v\ m G V(G,,) should map to 
V2,„ G V(G; 2 ) and vi m is already mapped by some 
node from G,-, , so if at the time we assign v\ m to 
some random node like V2„ G V(G, 2 ). This will 
prevent a possible good mapping of v\ n to vi n later 
in the iteration, because that will make vi n assigned. 
So this random matching is delayed until no more 
assignment is possible. In Figure 2 two mapping 
between two ontologies 0,0' are shown, and we 
want to decide the match node for v, G V(G) in the 
offspring. In parent 1 we have: 
W(v i )=P(v i ,v j ,t 2 )+P(v i ,v k ,h) = (f(v i )+f(v j )) + 
(f(vi)+f(v k )) 
and in parent 2 we have: 
W( Vi ) = P(Vi,Vj,t 2 ) + P{v ilVk ,t x ) = -{g{vi) + 

8(vj)) + (f(vi)+f(vk)) 

f,g are positive functions so amount of W(v,) in 

parent 1 (Figure 2 (a)) is greater than that of in parent 



500 best children from a previous step are sorted de- 
creasingly, and form the parents of current step. These 
parents produce 1000 new individuals, and the best 
500 of these 1000 children are selected as parents 
of next step. The i and (' + 1 parents involve in 
crossover and mutation, and create offsprings. From 
the set of all these offsprings and their two parents, 
the two best are chosen as children. The last parent 
is mixed with the first one to produce last offsprings. 
As stated before, after the necessary 1000 children are 
made, they are sorted according to the fitness function 
and the best 500 are selected for the next step (Figure 
3). 

5.7 End Condition 

To end the iteration of GA, we used a threshold for 
convergence. The sequential GA continued until the 
best mapping among all individuals in the population 
did not improve for more than 15 steps. Such map- 
ping is reported as the answer, and the alignment of 
two ontologies is then finalized. 



6 EVALUATION 

To evaluate our Genetic Algorithm, we designed three 
kids of experiments. In the first experiment, we tested 
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Figure 3: Population generation in each step of GA 



the Genetic Algorithm with diverse mutation proba- 
bility. In the second experiment, we tried to align to 
identical ontologies (actually we aligned one ontol- 
ogy with itself). This experiment helped us examine 
the efficiency and accuracy of the algorithm, when 
two ontologies are more similar to each other. To 
verify our contribution, we come with a third experi- 
ment, in which we used a naive local search alignment 
method. 

6.1 Limitations 

The coincidence-based ontology matching has an in- 
novative idea behind, however there are essential lim- 
itations with this method. The most important limi- 
tation is the available ontologies and test collections. 
Most of them do not have a large taxonomic struc- 
ture and so the method does not have enough merit 
for them. To test this method we needed large tax- 
onomic ontologies. We found "Tourism" ontologies 
(tou, ) a suitable test bench with approximately 340 
classes and concepts. 

6.2 Various Experiments 
Characteristics 

For the tourism ontologies, an ideal alignment was 
given, so that we could have the ideal alignment 
of our graphs. The precision measure (Baeza- Yates 
and Ribeiro-Neto, 1999) was calculated based on the 
given information. 



• Experiment 1 

As discussed previously, in this experiment we 
aligned "TourismA" with "TourismB". This is 
the main experiment of our method, to check 
the efficiency of our coincidence-based genetic 
algorithm. 

In this experiment, from each two parents, we 
made one offspring with the use of crossover 
function. From the three different individuals 
(parents and the offspring) we chose two best of 
them, to introduce as children of this amalgama- 
tion. 
Normalization Functions: are as below, 



/(v) = 



1 



,8(v,m(v)) 



g(v) = 



1 



^max(5,15— 5(v,m(v))) 

These functions actually satisfy the characteris- 
tics expected from f,g (explained in Sec. 4). / 
is a decreasing function and decreases with the 
growth of 8 and g is increasing, exponential func- 
tions were chosen for /, g so that /, g would have 
close and comparable values. In fact, these func- 
tions match the discussions on positive and neg- 
ative points for different categories of a coinci- 
dence based weight. 

- Experiment 1.1 

After the 1000 individuals were created, we 
mutated lower half of the them (with the 
mutation function described before) with the 
probability of 0.7. 

- Experiment 1.2 

After the 1000 individuals were created, we 
mutated lower half of the them (with the 
mutation function described before) with the 
probability of 0.3. 

- Experiment 1.3 

Mutation was done on every individual in the 
all of the 1000 children with the probability of 
0.5. 

• Experiment 2 

In this experiment we are aligning "TourismA" 
with itself. This actually is a verification that 
shows how efficient the genetic algorithm will 
work, if two ontologies are more similar and 
actually more coincident. 

The generation summary is similar to the previous 
experiment, and mutation was done on the lower 
half of the individuals, with the probability of 
0.5. 



Experiments Results 
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Figure 4: Precision result of experiments 
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Figure 5: Convergence of results in GA 



Normalization Functions: are like the previous 
experiment, 



/(v) = 
?(v) = 



1 



g 8(v,m(v)) 
1 



max(5,15— 8(v,m(v))) 



• Experiment 3 

This experiment actually provides a baseline com- 
parison of the GA method with a naive local 
search method. In this part, we implemented 
a naive hill-climbing local search method. For 
the start point, we made an initial alignment. In 
this initial alignment, all concepts in "TourismA" 
where matched with concepts in "TourismB". For 
a node Vj in TourismA if there were a node v'- with 
label label (vj) in TourismB, we matched Vj with 
v'j. Otherwise we mapped vj to a random node of 
TourismB. 

After that, in each iteration, the best single change 
(mutation) was preformed to improve the weight 
value of alignment. We iterated the method until 
almost 1000 steps, where, the results did not im- 
prove for more than 15 steps. 

6.3 Results 

Figure 4 shows the result of the above experiments in 
precision (Baeza- Yates and Ribeiro-Neto, 1999). As 
it is shown, with equal graphs Genetic algorithm finds 
the best matching and precision is 1 . 
In our experiments, the distance threshold, which we 
talked about in section 3.3, is set to be 4. We chose 
this number by experience, however there could be 
other solutions to determine this number, like ma- 
chine learning techniques, etc. The discussion on how 
to define this threshold and the distance function, is 
beyond the scope of this paper. With other experi- 
ments, however, the result is a little inaccurate in com- 



parison with the ideal alignment and precision is ap- 
proximately 0.8. 

We also did an investigation on iteration number and 
convergence of the result in this genetic algorithm for 
Experiment 1 . The results are shown in figure 5 . 



7 CONCLUSION AND FUTURE 
WORK 

Genetic Algorithm seems efficient in the problem 
of ontology alignment extraction. It also converge 
rapidly, for example after approximately 40 iterations 
in our experiments. This number can even be reduced 
by choosing a biased initial population, where labels 
can be involved to choose better initial mappings. 
Coincidence Based approach, when improved and 
used as a fitness function of a genetic algorithm might 
be useful when ontologies have a more taxonomic 
structure. There is also some weakness with genetic 
algorithms. One of them is the dependency of results 
to initial population. The more important weakness is 
when two ontologies are in the form of sparse graphs 
or even forests, in that case the domain for crossover 
is not a soft one, and small changes in an individual in 
crossover or mutation might take it to a very far point, 
and most of the time, an out of goal point of course. 
Work is now being done on tree ontologies, and rela- 
tions in them. Once we can align tree ontologies, we 
can model ontologies as trees and align these trees. 
We are also interested to extend our theory and mech- 
anisms for matching ontologies based on their shapes, 
graph areas, etc. 
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